Austrian Online Archive Processing: Analyzing Archives of the World Wide Web

نویسندگان

  • Andreas Rauber
  • Andreas Aschenbrenner
  • Oliver Witvoet
چکیده

With the popularity of the World Wide Web and the recognition of its worthiness of being archived we find numerous projects aiming at creating large-scale repositories containing excerpts and snapshots of Web data. Interfaces are being created that allow users to surf through time, analyzing the evolution of Web pages, or retrieving information using search interfaces. Yet, with the timeline and metadata available in such a Web archive, additional analyzes that go beyond mere information exploration, become possible. In this paper we present the AOLAP project building a Data Warehouse of such a Web archive, allowing its analysis and exploration from different points of view using OLAP technologies. Specifically, technological aspects such as operating systems and Web servers used, geographic location, and Web technology such as the use of file types, forms or scripting languages, may be used to infer e.g. technology maturation or impact.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Uncovering Information Hidden in Web Archives: A Glimpse at Web Analysis Building on Data Warehouses

The Internet has turned into an important aspect of our information infrastructure and society, with the Web forming a part of our cultural heritage. Several initiatives thus set out to preserve it for the future. The resulting Web archives are by no means only a collection of historic Web pages. They hold a wealth of information that waits to be exploited, information that may be substantial t...

متن کامل

Analysing and Enriching Focused Semantic Web Archives for Parliament Applications

The web and the social web play an increasingly important role as an information source for Members of Parliament and their assistants, journalists, political analysts and researchers. It provides important and crucial background information, like reactions to political events and comments made by the general public. The case study presented in this paper is driven by two European parliaments (...

متن کامل

Interlinking Media Archives with the Web of Data

Today’s enterprises heavily rely upon accurate, consistent, and timely access to data. However, company data is typically scattered across multiple databases and file shares in a multitude of forms and versions. Moreover, an increasing amount of valuable background information is available outside the companies' influence and control. This situation is typical for many enterprise information in...

متن کامل

A Web-based Interface for On-Demand Processing of Satellite Imagery Archives

We describe a web-based control system for invoking pipelined processes on a large on-line archive of geostationary satellite imagery through a World Wide Web browser interface. Our archive of GMS5 satellite data is stored on a combined RAID and tape silo system, accessible from a cluster of ATM-connected DEC Alpha workstations in Adelaide and Canberra. Our system makes use of parallel and dist...

متن کامل

Knowledge Linking for Online Statistics

The LAWA project investigates large-scale Web (archive) data along the temporal dimension. As a use case, we are studying Knowledge Linking for Online Statistics. Statistic portals such as eurostat's "Statistics Explained" (http://epp.eurostat.ec.europa.eu/statistics_explained/index.php/Main_Page) provide a wealth of articles constituting an encyclopedia of European statistics. Together with it...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002